Cortical–subcortical interactions underlie processing of auditory predictions measured with 7T fMRI

Abstract Perception integrates both sensory inputs and internal models of the environment. In the auditory domain, predictions play a critical role because of the temporal nature of sounds. However, the precise contribution of cortical and subcortical structures in these processes and their interaction remain unclear. It is also unclear whether these brain interactions are specific to abstract rules or if they also underlie the predictive coding of local features. We used high-field 7T functional magnetic resonance imaging to investigate interactions between cortical and subcortical areas during auditory predictive processing. Volunteers listened to tone sequences in an oddball paradigm where the predictability of the deviant was manipulated. Perturbations in periodicity were also introduced to test the specificity of the response. Results indicate that both cortical and subcortical auditory structures encode high-order predictive dynamics, with the effect of predictability being strongest in the auditory cortex. These predictive dynamics were best explained by modeling a top–down information flow, in contrast to unpredicted responses. No error signals were observed to deviations of periodicity, suggesting that these responses are specific to abstract rule violations. Our results support the idea that the high-order predictive dynamics observed in subcortical areas propagate from the auditory cortex.

The estimation of d' can be thought of a mixture model where each observation i from participant j is drawn from two Bernoulli distributions.In the first Bernoulli process, we model with likelihood θ1ij that position 1 is chosen over positions 2 and 3 (i.e.y1ij = 1).In the second Bernoulli process, we model with likelihood θ2ij that position 2 is chosen over positions 1 and 3 (i.e.y2ij = 1): with Zij = X1ij -X2ij; Z1ij = 1 -2X1ij -X2ij; Z2ij = 1 -X1ij -2X2ij; and X1ij and X2ij are coding whether the signal is in position 1 or 2 (or none), respectively (i.e.position 1: X1ij = 1, X2ij = 0; position 2: X1ij = 0, X2ij = 1; position 3: X1ij = 0, X2ij = 0).e1ij and e2ij represent random variation, or noise, in the perception for each Bernoulli process.d' is shared across both Bernoulli processes, and is defined by an unknown grand mean α and individual deflections of participant νj, the latter drawn from a normal distribution centered at zero and unknown hierarchical standard deviation σ: Finally, the unknown parameters of the mixture model are estimated from the data using approximation and assuming weakly informative prior distributions.We chose weakly informative prior distributions as an agnostic enough choice that would nonetheless properly define the parameter space: 1 st level analysis of brain data In order to obtain each participant's and ROI's neural response to each condition of interest the mean estimates and their associated measurement error (i.e. standard error of the estimate) were extracted using SPM and MarsBar (Penny et al., 2011;Brett et al., 2002).For simplicity, we used the classical restricted maximum likelihood estimation procedure to estimate the model parameters.In addition to the conditions and regressors in each model, other details include timing parameters and specific transformations to the data.
Regarding timing parameters, the units per design and onsets were specified in seconds with an interscan interval set to the repetition time of the image acquisition (TR = 1.53).The micro-time resolution (i.e. the number of bins with which each scan is sampled when convolving regressors) was set to the number of slices per scan (nSlices = 48); and the micro-time onset (i.e. the reference time-bin to which the regressors are offset) was set to the first slice of each scan (i.e.no offset).
Regarding data transformations, the default SPM high-pass filter of 128 seconds and auto-correlation whitening of order 1 -AR(1)-were applied.Regressors were convolved using SPM's canonic hemodynamic response function.Each ROI time course y was estimated separately with global normalization across session.This type of scaling maximizes sensitivity by multiplying each signal value Y at time t in session s by the constant value 100 divided by the average amplitude of that session µ (Andersson et al., 2001): Once the model parameters were estimated, contrasts were defined for each condition of interest in all sessions with a contrast weight of 1/nSession and every other condition zeroed out.We then finally extracted from MarsBar each condition's mean estimate and their associated measurement error, computing the latter by dividing the mean estimates by their corresponding t-values returned by MarsBar.

st level analysis of brain data
In order to analyze the data at the group level while preserving the precision of the 1 st level analyses we used a mixed-effects multilevel analysis (MEMA).This is a sensible improvement from the classical twostage analysis, where 1 st level estimates are reduced to summary statistics and are ultimately treated as mere observations in a classical test.While this approach is simple and widely used, it comes at the expense of losing valuable information.This is especially critical in designs studying basic physiological phenomena (e.g.neural responses to stimuli), where the statistical precision depends on the number of trials, and each participant is considered, put simply, a replication of the same experiment.In such cases, it is clear to see how such an experiment with a priori sufficient observations (e.g.nTrials = 216 per nSubjects = 10) gets reduced to an insufficient sample size if the classical two-step approach is employed.On the other hand, performing inferences at every individual level separately could incur in atomistic fallacies, in addition to offering an inelegant solution.A popular choice in such cases is a MEMA.This method allows to preserve the 1 st levels' precision while still drawing inferences at the group level in aggregate, thus maximizing statistical power (Chen et al., 2012).
While the most widely used approach for MEMA is full hierarchical models -where every observation from every individual is pooled-this is a computationally expensive approach in the case of fMRI analysis, given the complexity of the models.An alternative approach with very similar results is to still analyze the data in two stages but incorporating the measurement error of the 1 st level estimates into the 2 nd level (Chen et al., 2012).This is achieved by estimating meta-analytically the "true" group-level effect from each individual observed effect.In our case, each observed effect y comes from a given condition i and a given participant j.Since we have four dependent variables (i.e.ROIs), we model them as correlated with a multivariate gaussian likelihood function: Where µij is a vector of true population means (for all four ROIs) and ij a covariance matrix: With each ρ quantifying the unknown correlation between two given outcomes (i.e.ROIs) and each σij pooling the unknown population error τ with each measurement error MEij: In turn, and since we are modeling repeated measures (i.e.experimental conditions), each population mean θij is defined by an unknown grand mean α and individual deflections of experimental condition νi and participant νj: The latter drawn from a normal distribution centered at zero and unknown hierarchical standard deviations  and :  1 , ~ (0,  1 );  2 ~ (0,  2 );  3 ~ (0,  3 );  4 ~ (0,  4 )  1 , ~ (0,  1 );  2 ~ (0,  2 );  3 ~ (0,  3 );  4 ~ (0,  4 ) Finally, the unknown parameters are estimated from the data using approximation and assuming weakly informative prior distributions: ] ~ (3)

Effective connectivity analysis of initial tone detection
In order to assess the directionality of initial tone detection along the auditory pathway we estimated and compared two path analyses (McIntosh and Gonzalez-Lima, 1994) with opposing directions.These models were fit on the first tone data only (i.e.std0).These models consisted of four independent regression lines where the average neural response to the stimulus was estimated for a given ROI from activity in the preceding ROI, or its average activity in the case of the first path in the model.Similar to a MEMA, since the 1 st level estimates error was observed, we fitted the models with measurement error for both responses and predictors (McElreath, 2020).In both cases, we first modeled the neural response per participant j in ROI 1 with Gaussian likelihood: Where the population mean θ1 is given by the unknown average α1: And standard deviation σ1j pools the unknown population error τ1 with each measurement error ME1j: Next in the path, the neural response in ROI 2 y2j is modeled with measurement error ME2j: Where β1 is the unknown weight of the slope and the predictor X1j represents the "true" neural response in ROI 1, which is found by modeling the observed neural response x1j with measurement error ME1j with Gaussian likelihood: Likewise, in the remaining two paths the neural response in ROIs 3 and 4 are predicted from the true neural response in ROIs 2 and 3 with measurement error, respectively: Since the true predictor values X1j, X2j and X3j are internally handled as latent variables, these are drawn from a multivariate normal distribution with unknown means µ, standard deviations  and correlation ρ: Finally, the unknown parameters are estimated from the data using approximation and assuming weakly informative prior distributions: The intercept only model consisted in the same set of regressions, but all ROIs activity were predicted from their average only, zeroing out all βYi terms: -Intercept-only model: o 1 st path: Average IC activity o 2 nd path: Average MGB activity o 3 rd path: Average AC activity o 4 th path: Average STG activity

Effective connectivity analysis of predictive suppression
In order to assess the directionality of the deviant position effect along those ROIs showing predictive suppression we estimated and compared two path analyses with opposing directions.These models were fit on the deviant data only, after standardizing observations per participant and ROI.These models consisted of three independent regression lines with measurement error where the neural response to the stimulus was estimated for a given ROI from activity in the preceding ROI, or the position of the deviant in the case of the first path in the model.In both cases, we first modeled the neural response i per participant j in ROI 1 with Gaussian likelihood: Where the population mean θ1ij is given by the following regression equation: And the standard deviation σ1ij pools the unknown population error τ1 with each measurement error ME1ij: In the regression equation, α1 is the unknown intercept, X0ij is the deviant position, β0 is the unknown weight of the slope, and ν1j accounts for varying intercepts per participant, the latter drawn from a normal distribution centered at zero with unknown hierarchical standard deviation 1: Next in the path, the neural response in ROI 2 y2ij is also modeled with measurement error ME2ij: Where the predictor X1ij represents the "true" neural response in ROI 1, which is found by modeling the observed neural response x1ij with measurement error ME1ij with Gaussian likelihood: Likewise, in the remaining path the neural response in ROIs 3 is predicted from the true neural response in ROIs 2 with measurement error, respectively: Since the true predictor values X1ij and X2ij are internally handled as latent variables, these are drawn from a multivariate normal distribution with unknown means µ, standard deviation  and correlation ρ: Finally, the unknown parameters are estimated from the data using approximation and assuming weakly informative prior distributions: The Analysis of Functional Brain Images.Elsevier.• Watanabe S, Opper M (2010) Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory.J Mach Learn Res.
1 ~ ( 1 )  2 ~ ( 2 )θ1ij and θ2ij are in turn given by the SDT model for the 3-AFC task: ′   2 +  2 )( 2 )′ 2 relation between outcomes and predictors in the two competing models are: -Bottom-up model: o 1 st path: Deviant position → IC activity o 2 nd path: IC activity → MGB activity o 3 rd path: MGB activity → AC activity -Top-down model: o 1 st path: Deviant position → AC activity o 2 nd path: AC activity → MGB activity o 3 rd path: MGB activity → IC activity The intercept only model consisted in the same set of regressions, but all ROIs activity were predicted from their average only (plus per-participant deflections from average), zeroing out all βYi terms: -Intercept-only model: o 1 st path: Average IC activity o 2 nd path: Average MGB activity o 3 rd path: Average AC activity References • Andersson JLR, Ashburner J, Friston K (2001) A global estimator unbiased by local changes.Neuroimage 13(6):1193-1206.• Brett M, Anton J-L, Valabregue R, Poline J-B, Others (2002) Region of interest analysis using an SPM toolbox.In: 8th international conference on functional mapping of the human brain, pp 497.Sendai.• Chen G, Saad ZS, Nath AR, Beauchamp MS, Cox RW (2012) FMRI group analysis combining effect estimates and their variances.Neuroimage 60:747-765.• DeCarlo LT (2012) On a signal detection approach to m-alternative forced choice with bias, with maximum likelihood and Bayesian approaches to estimation.J Math Psychol 56(3):196-207.• Kruschke J (2015) Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan.Elsevier.• Makowski D, Ben-Shachar MS, Chen SHA, Lüdecke D (2019) Indices of Effect Existence and Significance in the Bayesian Framework.Front Psychol 10:2767.• McElreath R (2018) Statistical rethinking: A Bayesian course with examples in R and Stan.Chapman and Hall/CRC.• McIntosh AR, Gonzalez-Lima F (1994) Structural modeling of functional neural pathways mapped with 2-deoxyglucose: effects of acoustic startle habituation on the auditory system.Brain Res 547:295-302.• Penny WD, Friston KJ, Ashburner JT, Kiebel SJ, Nichols TE (2011) Statistical Parametric Mapping: